224 research outputs found
Leveraging Large Language Models and Weak Supervision for Social Media data annotation: an evaluation using COVID-19 self-reported vaccination tweets
The COVID-19 pandemic has presented significant challenges to the healthcare
industry and society as a whole. With the rapid development of COVID-19
vaccines, social media platforms have become a popular medium for discussions
on vaccine-related topics. Identifying vaccine-related tweets and analyzing
them can provide valuable insights for public health research-ers and
policymakers. However, manual annotation of a large number of tweets is
time-consuming and expensive. In this study, we evaluate the usage of Large
Language Models, in this case GPT-4 (March 23 version), and weak supervision,
to identify COVID-19 vaccine-related tweets, with the purpose of comparing
performance against human annotators. We leveraged a manu-ally curated
gold-standard dataset and used GPT-4 to provide labels without any additional
fine-tuning or instructing, in a single-shot mode (no additional prompting)
Solar Event Tracking with Deep Regression Networks: A Proof of Concept Evaluation
With the advent of deep learning for computer vision tasks, the need for
accurately labeled data in large volumes is vital for any application. The
increasingly available large amounts of solar image data generated by the Solar
Dynamic Observatory (SDO) mission make this domain particularly interesting for
the development and testing of deep learning systems. The currently available
labeled solar data is generated by the SDO mission's Feature Finding Team's
(FFT) specialized detection modules. The major drawback of these modules is
that detection and labeling is performed with a cadence of every 4 to 12 hours,
depending on the module. Since SDO image data products are created every 10
seconds, there is a considerable gap between labeled observations and the
continuous data stream. In order to address this shortcoming, we trained a deep
regression network to track the movement of two solar phenomena: Active Region
and Coronal Hole events. To the best of our knowledge, this is the first
attempt of solar event tracking using a deep learning approach. Since it is
impossible to fully evaluate the performance of the suggested event tracks with
the original data (only partial ground truth is available), we demonstrate with
several metrics the effectiveness of our approach. With the purpose of
generating continuously labeled solar image data, we present this feasibility
analysis showing the great promise of deep regression networks for this task.Comment: 8 pages, 5 figures, this has been submitted and accepted for
publication at IEEE Big Data 2019 - SABID Worksho
Estimation of vertical slip rate in an active fault-propagation fold from the analysis of a progressive unconformity at the NE segment of the Carrascoy Fault (SE Iberia)
Many studies have dealt with the calculation of fault-propagation fold growth rates considering a variety of kinematics models, from limb rotation to hinge migration models. In most cases, the different geometrical and numeric growth models are based on horizontal pre-growth strata architecture and a constant known slip rate. Here, we present the estimation of the vertical slip rate of the NE Segment of the Carrascoy Fault (SE Iberian Peninsula) from the geometrical modeling of a progressive unconformity developed on alluvial fan sediments with a high depositional slope. The NE Segment of the Carrascoy Fault is a left-lateral strike slip fault with reverse component belonging to the Eastern Betic Shear Zone, a major structure that accommodates most of the convergence between Iberian and Nubian tectonics plates in Southern Spain. The proximity of this major fault to the city of Murcia encourages the importance of carrying out paleosismological studies in order to determinate the Quaternary slip rate of the fault, a key geological parameter for seismic hazard calculations. This segment is formed by a narrow fault zone that articulates abruptly the northern edge of the Carrascoy Range with the Guadalentin Depression through high slope, short alluvial fans Upper-Middle Pleistocene in age. An outcrop in a quarry at the foot of this front reveals a progressive unconformity developed on these alluvial fan deposits, showing the important reverse component of the fault. The architecture of this unconformity is marked by well-developed calcretes on the top some of the alluvial deposits. We have determined the age of several of these calcretes by the Uranium-series disequilibrium dating method. The results obtained are consistent with recent published studies on the SW segment of the Carrascoy Fault that together with offset canals observed at a few locations suggest a net slip rate close to 1 m/ka
Análisis del desempeño de algoritmos basados en la teorÃa de campo medio para problemas tipo mochila.
Se propone una metodologÃa basada en teorÃas de campo medio para resolver problemas tipo mochila con funciones objetivo lineales y cuadráticas a gran escala. Además, se consideran problemas desde una hasta treinta restricciones lineales. Estos problemas son conocidos en la literatura como el problema de la mochila, el problema de la mochila cuadrática y el problema de la mochila multidimensional. Fueron seleccionados por su sencilla interpretación y múltiples aplicaciones en la vida real. Asimismo, en los dos primeros problemas, se toman casos en los que se sabe que dado el algoritmo exacto no es conveniente su implementación. Para el tercer problema simplemente se toman los casos más usados para validar la eficiencia de algoritmos, casos en los que el valor ´optimo es desconocido para algunos tipos. La esencia de la metodologÃa propuesta es encontrar una función de distribución de probabilidad asociada a un problema de optimización. Una de las más usadas es la distribución de Boltzmann que involucra la función objetivo y sus restricciones, mediante la relajación Lagrangiana, transformando un problema discreto en uno continuo. Sin embargo, la distribución por si sola es compleja y difÃcil de tratar, por lo que se realiza una aproximación de campo medio que resulta de elegir de un conjunto de distribuciones sencillas, aquella que ofrezca la menor diferencia entre la distribución de Boltzmann y ´esta. Los problemas de optimización usados para validar la eficiencia de la metodologÃa propuesta son binarios por lo que la distribución general de campo medio que se plantea es adecuada para este tipo. En dado caso en el que se quiera utilizar esta metodologÃa en otro tipo de problemas, es necesario presentar otra distribución de campo medio que se ajuste a ellos. El enfoque de campo medio usado en el presente trabajo permite encontrar ecuaciones independientes que estiman la probabilidad de ocurrencia de cada una de las variables a través del espacio dual; es decir, dando valores a los multiplicadores de LaGrange, es posible construir un vector de probabilidades en el que cada elemento representa la probabilidad de activar una determinada variable de una solución del problema binario. El algoritmo propuesto es determinista y capaz de encontrar soluciones de alta calidad en los problemas de prueba, con tiempos de ejecución cuyos ´ordenes de magnitud son inferiores a algoritmos recientemente estudiados. Objetivos y método de estudio: ´ Distinguir e identificar las bondades de utilizar un modelo probabilÃstico de campo medio, en problemas tipo mochila, para la construcción de soluciones factibles. Para ello, se parte de que cualquier problema de optimización está relacionado con la distribución de probabilidad de Boltzmann la cual es aproximada por una distribución mucho más sencilla. Teniendo la distribución aproximada es posible construir una solución binaria mediante técnicas de redondeo. CONTRIBUCIONES y CONCLUSIONES: Se logra obtener una metodologÃa rápida y eficaz para construir soluciones factibles en problemas de gran escala de tipo mochila. Se abordan problemas con restricciones lineales, funciones objetivo cuadráticas y lineales, e inclusive problemas con múltiples restricciones. En todos estos casos se encuentran soluciones de calidad en poco tiempo, en promedio conforme crece su tamaño la diferencia entre lo mejor conocido y la solución de la metodologÃa propuesta tiende a disminuir. Esto último es debido a que la teorÃa de campo medio, como su nombre lo indica, trabaja con un esquema de promedios por lo que a medida que crece el número de variables las soluciones tienden a ser más precisas
Pulse of the Pandemic: Iterative Topic Filtering for Clinical Information Extraction from Social Media
The rapid evolution of the COVID-19 pandemic has underscored the need to
quickly disseminate the latest clinical knowledge during a public-health
emergency. One surprisingly effective platform for healthcare professionals
(HCPs) to share knowledge and experiences from the front lines has been social
media (for example, the "#medtwitter" community on Twitter). However,
identifying clinically-relevant content in social media without manual labeling
is a challenge because of the sheer volume of irrelevant data. We present an
unsupervised, iterative approach to mine clinically relevant information from
social media data, which begins by heuristically filtering for HCP-authored
texts and incorporates topic modeling and concept extraction with MetaMap. This
approach identifies granular topics and tweets with high clinical relevance
from a set of about 52 million COVID-19-related tweets from January to mid-June
2020. We also show that because the technique does not require manual labeling,
it can be used to identify emerging topics on a week-to-week basis. Our method
can aid in future public-health emergencies by facilitating knowledge transfer
among healthcare workers in a rapidly-changing information environment, and by
providing an efficient and unsupervised way of highlighting potential areas for
clinical research.Comment: 24 pages, 5 figures. To be published in the Journal of Biomedical
Informatic
- …